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Abstract 


Coding  capacity  is  determined  for  a  class  of  additive  Gaussian  channels, 
and  bounds  on  capacity  are  obtained  for  a  class  of  nonGaussian  channels.  The 
channels  may  be  ith  or  without  memory,  stationary  or  nonstationary.  The 
constraint  is  partially  given  in  terms  of  an  increasing  family  of  finite¬ 
dimensional  subspaces.  A  general  expression  for  the  capacity  is  obtained, 
which  depends  upon  the  relation  between  the  noise  covariance  and  the 
constrain:  on  the  generalized  signal-to-noise  energy  ratio  for  the  code  words. 
The  well-known  expression  for  capacity  of  the  discrete-time  stationary 
Ga<  ssian  channel  i3  shown  to  be  a  special  case.  The  general  expression 
provides  new  results  on  capacity  for  nonstationary  discrete-time  channels  and 
for  continuous-time  channels  (stationary  or  nonstationary)  with  fixed  time  of 
fr  uismission. 
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Introduction 


Coding  capacity  of  additive  Gaussian  channels  with  memory  is  one  of  the 
major  areas  of  open  problems  in  basic  information  theory.  Even  for  the  case 
of  the  stationary  discrete-time  channel  with  a  simple  energy  constraint,  only 
recently  has  a  complete  proof  been  given  [15]  for  the  information  capacity, 
which  one  can  then  apply  toward  a  rigorous  and  complete  proof  of  the  coding 
capacity.  For  nonstationary  discrete-time  and  continuous- time  channels  with 
or  without  memory,  there  are  apparently  no  published  results  on  coding 
capac i ty . 

Moreover,  in  the  classical  continuous-time  channel,  the  model  for  which 
results  have  beei  known  constitutes  a  proper  subset  of  the  class  of  stationary 
channels,  and  there  is  a  very  large  universe  of  stationary  channels  not 
belonging  to  this  subset. 

This  paper  gives  results  on  coding  capacity  for  a  large  class  of 
channels,  which  may  be  stationary  or  nonstationary,  with  or  without  memory. 

The  formulation  is  somewhat  different  and  more  general  than  that  usually 
followed.  The  generality  permits  one  to  focus  on  channels  where 
dimensionality  of  the  code  word  set  is  a  key  component  of  the  constraints.  In 
the  classical  setup,  the  elements  of  a  code  are  limited  in  their  time 
duration.  The  present  paper  replaces  this  with  a  constraint  defined  by  an 
increasing  family  of  finite-dimensional  subspaces.  The  classical  discrete- 
time  channel  is  then  a  special  case  of  this  framework,  and  several 
applications  to  these  channels  are  given.  These  applications  include 
nonstationary  single-user  and  multi-user  channels.  For  example,  it  will  be 
seen  that  this  formulation  shows  that  it  is  possible  to  use  a  code  word  set  of 
arbitrarily  large  cardinality  as  transmission  time  n  -»  «.  with  the  maximum 
decoding  error  probability  converging  to  zero,  while  the  classical  analysis 
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gives  zero  capacity  and  a  maximum  decoding  error  of  one  for  any  non-zero  rate. 
Another  interesting  result,  for  the  memoryless  non-s tat  ionary  Gaussian 
channel,  is  that  the  noise  covariance  can  have  eigenvalues  of  infinite 
multiplicity  which  have  no  effect  on  coding  capacity. 

The  approach  also  provides  results  for  continuous- time  channels.  In  the 
classical  continuous- time  channel,  the  transmission  time  T  is  permitted  to  be 
arbitrarily  long  in  determining  capacity.  Then,  by  transmitting  at  a  rate 
below  capacity  and  for  a  sufficiently  long  time,  the  coder  has  the  ability  to 
use  an  arbitrarily  large  code  word  set  while  achieving  arbitrarily  small 
maximum  decoding  error  probability.  In  this  formulation,  transmission  rate  is 
the  rate  of  increase  in  the  log  of  the  cardinality  of  the  code  word  set,  as 
the  transmission  time  is  increased 

Suppose,  however,  that  the  transmission  time  T  is  limited,  as  will 
ordinarily  be  the  case  in  practice.  One  may  then  ask-'  if  arbitrarily  large 
transmission  energy  is  available,  is  it  possible  to  choose  a  code  word  set  of 
arbitrarily  large  cardinality  while  achieving  arbitrarily  small  maximum 
decoding  error  probability?  The  mechanism  for  accomplishing  this,  if  it  is 
possible,  will  consist  of  using  an  increasingly-complex  coding-decoding 
structure.  This  can  be  interpreted  as  an  increase  in  dimensionality  of  the 
code  word  set.  The  "rate"  of  transmission  is  now  the  rate  of  increase  one 
obtains  in  the  log  of  the  cardinality  of  the  code  word  set  as  its 
dimensionality  is  increased.  A  higher  rate  implies  that  the  coding-decoding 
structure  can  be  less  complex  for  a  specified  cardinality  of  the  code  word  set 
and  a  specified  maximum  error  probability. 

It  Is  shown  here  that  for  some  such  channels  it  is  not  possible  to 
achieve  arbitrarily  small  maximum  decoding  error  probability  when  using  code 
word  sets  of  arbitrarily  large  cardinality,  for  any  positive  rate.  A  special 
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case  of  these  channels  is  the  Ho lsinger-Gal lager  model  of  the  stationary 
Gaussian  channel  analyzed  in  [11]  (when  the  time  duration  is  fixed). 

Moreover,  for  such  channels  it  is  shown  that  any  non-zero  rate  leads  to  a 
maximum  decoding  error  probability  of  one.  However,  for  a  large  class  of 
continuous- time  channels  of  fixed  time  duration,  it  is  possible  to  achieve 
arbitrarily  small  decoding  error  probability  with  code  word  sets  of 
arbitrarily  large  cardinality;  those  channels  are  characterized,  a  number  of 
examples  are  given,  and  their  capacity  is  obtained. 

Upper  bounds  on  coding  capacity  are  obtained  for  a  large  class  of 
nonGaussian  channels.  Several  examples  are  included.  For  the  class  of 
channels  considered,  it  is  shown  that  coding  capacity  is  equal  to  information 
capacity  when  the  noise  is  Gaussian.  Apparently,  this  has  only  recently  been 
explicitly  stated  for  the  classical  discrete-time  channel  (with  memory)  [8]. 

Emphasis  here  is  on  obtaining  the  capacity.  However,  Theorem  2  gives 
bounds  on  error  probability  for  Gaussian  channels  based  on  results  of  Ebert 
[10]  and  Gal lager  [11], 

In  addition  to  obtaining  specific  new  results  for  coding  capacity  of  a 
large  class  of  additive  channels,  the  development  brings  out  the  essential 
importance  to  the  capacity  of  the  limit  points  of  the  spectrum  (the  essential 
spectrum)  of  the  operator  defining  the  relationship  between  the  noise 
covariance  and  the  energy  constraint  on  the  code  words. 

The  proof  of  the  general  expression  for  the  capacity  is  based  on  the 
spectral  theory  for  self-adjoint  operators  in  Hilbert  space,  including  the 
integral  representation  (as  given,  for  example,  in  [IT]).  That  proof,  and 
those  of  several  other  necessary  mathematical  results,  is  contained  in  [6]. 
The  emphasis  here  is  on  applications. 
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Problem  Frajnework 


In  the  next  few  sections,  the  setting  and  definitions  for  the  coding 
capacity  problem  will  be  given.  In  order  to  illustrate  these  concepts  and 
definitions,  the  classical  discrete-time  channel  will  be  frequently  employed. 

It  is  assumed  that  the  noise  sample  paths  belong  to  a  real  separable 
Hilbert  space  H,  where  H  has  inner  product  <*,*>  and  associated  norm  11*11.  The 
noise  is  described  by  a  set  function  p^.  will  be  a  f ini tely-addi tive 

probability  defined  on  the  cylinder  sets  of  H:  the  collection  of  all  sets  of 

the  form  (x  =  (<x,u^> . <x,un>)  £  D^} .  w^cre  n  £  1.  D  is  a  Boiel  in  lRn, 

and  Uj,...,u  are  any  n  elements  of  H.  Thus,  if  Hq  is  any  finite-dimensional 
subspace  of  H,  and  Pq  is  the  projection  operator  in  H  having  range  Hq,  let  p^ 
be  defined  on  the  Borel  sets  of  Hq  by  Pq(A)  =  p^{x:  P^x  €  A} .  p^  is  then  a 
countably-additive  probability.  Our  basic  assumptions  on  the  noise  are  that 

a)  J‘^<x,y>^dp^(x)  <  “  all  y  in  II,  and 

b)  J'H<x.y>dpN(x)  c  0  for  all  y  in  H. 

Assumption  (a)  means  that  p^  has  a  covariance  operator  R^,  which  is 
linear,  bounded,  and  non-negative,  and  also  implies  that  the  noise  mean 
exists.  Assumption  (b)  is  chat  the  noise  has  zero  mean.  We  can  assume  WLOG 


(without  loss  of  generality)  that  is  strictly  positive  on  H.  R^  is  defined 
as  <R^u,v>  =  J^<u,xXv,x>dp^(x)  for  u,v  in  H. 


p^  is  Gaussian  if  for  any  y  €  H  the  distribution  function  P 


is  Gaussian, 


Py(tt)  =  <x.y>  <,  a}. 


Formulation  of  Constraints 

Let  R^  be  a  strictly  positive  covariance  operator  in  H  satisfying 
range (R^)  C  range(R^) . 

(Hn),  n  £  1,  will  denote  a  family  of  finite-dimensional  subspaces  of  H, 
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such  that  for  all  n  ^  1. 


a)  H  CH  . 
’  n  n+1 


b)  dim(Hn)  =  n. 

Let  be  the  projection  operator  with  range  equal  to  H^,  and  define 

<v  /v* 

R™  =  P  RJP  .  R™  is  then  strictly  positive  on  H  .  For  x  in  H  ,  the  norm 
W  t  n  n  W  n  W  %  n  n  n 

!lxllw  n  is  defined  by  llxll^  =  llyll,  where  y  is  the  unique  element-  of  H  that 

'A  —Vi 

satisfies  R^  ^y  =  x.  Formally,  we  write  llxll^  n  =  IIR^  ^xll  for  x  in  H^; 

although  R^  n  does  not  exist  on  H,  it  is  well-defined  on  H^. 

The  constraints  on  the  code  words  are  now  as  follows:  For  each  n  >  1, 

2 


the  admissible  code  words  x, . 

foi  i  =  1,2 . K(n). 


1 . *K(n)  h®10118  to  Hn  31x6  satisfy  llxi"W,n  - 


<  nP 


As  an  Important  example  of  such  constraints,  consider  the  classical 
discrete-time  mcmoryless  channel  with  a  simple  energy  constraint.  This  can  be 
formulated  in  the  above  terms  by  taking  H  =  a™!  ^  the  identity,  giving 

Hn  =  (x  in  Xj  =  0  for  i  >  n} 

2  n  2 

llxll™  =  2  x  for  x  in  H  . 

W.n  i  n 

Another  example  involves  the  continuous- time  channel  with  fixed 
transmission  time.  T.  In  this  case,  H  can  be  taken  as  LgfO.T],  and  as  the 

linear  span  of  {e^.eg . en)'  where  (en-  n  ;>  1}  is  an  infinite  orthonormal 

set  in  LgCO.T].  It  will  always  be  assumed  that  a  process  with  paths  in 
LgCO.T]  is  product-measurable. 

It  can  be  noted  that  for  channel  capacity  calculations  the  constraint 

2 

l.xl.w  n  i  nP  (for  code  words  in  Hn)  for  every  n  is  equivalent  to  the  constraint 

limsup  —  llxll?  i  P. 
n  w  f  n 
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Let  (H^,  IMI^  n)  define  the  constraints.  Let  p^  be  the  noise  probability 
on  H.  and  let  pjj  be  the  probability  on  induced  from  p^  by  the  projection 
operator  A]  =  p^{x:  Pnx  €  A},  for  A  any  Borel  set  in  H  ,  It  is  not 

required  that  p^  be  countably  additive;  thus,  for  example,  the  discrete-time 
memoryless  stationary  Gaussian  channel  is  included  in  this  formulation. 

For  fixed  n  £  1,  a  code  (k.n.e)  [1]  is  a  set  of  k  code  words  {x^ . x^} 

and  k  disjoint  Borel  sets  in  such  that  the  elements  of  {x^ . x^}  obey  the 

constraints,  and 

p^y;  P^y  +  xt  €  Cj}  l  1  -  £  for  i  =  1.2 . k. 

Note  that  this  last  probability  inequality  can  be  written  as 
pJJ{y:  y  +  xt  €  C^}  l  1  -  e.  i  =  1.2 . k. 

A  real  number  R  l  0  is  an  admissible  rate  if  there  exists  an  infinite 
n^R 

sequence  of  codes  ([e  ],  )  with  -»  0  as  i  -♦  <*>,  where  [r]  is  the 

integer  part  of  the  real  number  r. 

The  coding  capacity  is  then  the  supremum  of  the  set  of  admissible  rates. 

As  can  be  seen,  this  definition  of  coding  capacity  contains  that  of  the 

classical  discrete-time  channel  as  a  special  case,  defining  the  constraint 

family  (H^,  11*11^  n)  as  in  the  previous  section.  More  generally,  the  capacity 

gives  an  indication  of  the  effect  on  size  of  the  code  word  set  that  can  be 

obtained  by  increasing  the  dimensionality  of  the  code  word  set,  while 

requiring  that  liminf  a  =0. 

nH» 

The  constraint  on  the  transmitted  signal  is  given  in  terms  of  a 

-L 

covariance  operator  in  H.  A  basic  assumption  is  that  range(R^)  contains 
x 

range(R^).  The  existence  of  such  an  operator,  and  the  assumption  on  range 
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relations,  are  necessary  in  order  that  the  information  capacity  be  finite  [5]; 
moreover,  when  p^  is  Gaussian,  it  will  be  seen  that  finite  information 
capacity  is  necessary  in  order  to  have  finite  coding  capacity.  Thus,  the 
formulation  of  the  problem  is  quite  general  (so  long  as  an  average- type 
constraint  is  to  be  used).  Under  this  assumption,  there  exists  a  self-adjoint 

operator  S  in  H  such  that  =  R^(I+S)R^,  where  (I+S)  exists  and  is  bounded 
(see  [5,  Prop.  1]  for  ramifications  of  this  fact).  The  limit  points  of  the 
spectrum  of  S  will  play  a  key  role  in  this  paper.  These  limit  points  (the 
essential  spectrum  of  S)  consist  of  all  eigenvalues  of  infinite  multiplicity, 
all  limits  of  sequences  of  distinct  eigenvalues,  and  all  points  of  the  con¬ 
tinuous  spectrum  [17,  p.  363].  "Essential  spectrum"  is  the  modern  terminology 
for  this  set;  it  will  be  denoted  by  aess(S) •  The  continuous  spectrum  of  S 
consists  of  all  real  numbers  X  such  that  the  range  of  (S-XI)  is  not  closed. 

In  many  applications,  the  constraint  will  be  given  by  a  time- invariant 

a  n 

linear  filter  f  with  transfer  (frequency)  function  f.  In  such  cases,  jf| 
defines  a  spectral  density  and  thereby  the  operator  R^. 

Expressions  for  Evaluation  of  the  Coding  Capacity 

The  noise  has  probability  p^  and  covariance  operator  R^.  p^  is  the  noise 

probability  on  Hn_  defined  by  P^J(C)  =  p^(x:  PRX  e  C)  for  C  a  Borel  set  in  . 

R^  is  the  covariance  operator  defining  the  constraint  on  the  code  words. 

R™  =  P  R^P  and  R„  =  P  R..P  .  Let  I  be  the  identity  in  H  ;  let  S  be  the 
w,n  n  w  n  Ti.n  n  n  n  n  n  n 

‘A  'A 

self-adjoint  operator  mapping  Hn  into  defined  by  R^  ^  =  R^  ^(I^+S^JR^  n- 
IIS^xll  =  0  if  x  is  orthogonal  to  H^.  Since  is  self-adjoint  as  an  operator 
in  H^,  it  has  n  orthonormal  eigenvectors  belonging  to  with  corresponding 
eigenvalues  0^  $  i  ...  i 

Absolute  continuity  of  probabilities  will  be  frequently  encountered.  If 
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p  and  d  are  two  f initely-additive  functions  on  the  cylinder  sets  of  H.  then 
p  <<  v  and  p  ~  i>  will  denote,  respectively,  absolute  continuity  of  p  w.r.t.  u 
and  mutual  absolute  continuity  of  p  turd  u.  p  <<  u  if  and  only  if  for  every 
e  >  0  there  exists  6  >  0  such  that  i>(A)  <  6  =>  p(A)  <  e  for  any  cylinder  set  A 
in  H. 


The  bound  on  coding  capacity  for  nonGaussian  channels  will  involve  the 
relative  entropies  H^(N)  and  (H^(N).  n  ^  1}.  Let  p^  be  the  Gauss  inn  noi' 
measure  (perhaps  not  countably  additive)  having  covariance  operator  R^.  In 
this  framework,  the  definition  is  H(_;n(N)  =  sup^  H^(N),  where  H^n(N)  is  the 


r  rif  /  ^ 

entropy  of  pjj  with  respect  to  p^:  H^(N)  -  /  n^log  — — JdpJJ.  Of 


course , 


H^(N)  =  «  if  p^  is  not  absolutely  continuous  with  respect  to  P^  P^  <  <  p^ 
is  necessary  in  order  to  have  H^(N)  finite,  where  p‘  denotes  the  restriction 
of  p  to  the  closed  linear  span  of  U  H  . 


The  relative  entropy  H^(N)  can  be  defined  in  terms  of  differential 

entropy  for  the  discrete-time  channel.  Suppose  that  N  =  (N^,  N2 . V  b*5 

zero  mean  and  a  probability  density  pJJ.  Then,  dp^/dp„^  exists,  since  Lebesgue 
measure  and  nondegenerate  Gaussian  measure  are  mutually  absolutely  continuous 
c~  IRn  The  differential  entropy  is 


**”(**)  *  -S  [log  pfJ(x)]pJJ(x)dAn(x) 

where  Xn  denotes  Lebesgue  measure  on  Fn.  Thus, 

Ji  ^n 

^(N)  =  -  5  log  — r-(x)]cmjj(x)  -  J  log  — ~<x)]dp^(x) 
!Rn  dpj^  1  n  lRn  <L\n  J  ' 

=  -  H^CN)  ♦  H^GN), 
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so  that  hJ^(N)  =  H^GN)  -  H^N). 

The  rel  .  ive  entropy  wtll  enable  us  to  give  an  upper  bound  for 

nonGaus'  an  channels  such  that  lim  —  H™(N)  <  The  bound  will  be  seen  to  be 

n  GN 
n 

equal  to  the  capacity  for  the  Gaussian  channel  with  noise  probability  pi 

whenever  lim  ^  H^(N)  =0.  A  particular  case  of  this  is  when  H^(N)  <  ro  To 
n 

illustrate  that  this  occurs  in  some  important  applications,  suppose  tliat 
H  =  I^fO.T]  and  that  pt^  is  defined  by  a  mean-square  continuous  stochastic 
process  (N^).  Suppose  also  that  (Nt)  =  (V^+S^) ,  where  ( V ^ )  is  a  m.s. 
continuous  Gaussian  process  and  (St)  is  a  process  independent  of  (V^)  and  such 
that  the  paths  of  (S^)  belong  (w.p.  1)  to  the  RKHS  of  the  covariance  of  (V^). 
Suppose  also  that  the  Gaussian  process  with  the  (S^)  covariance  has  sample 
paths  in  the  RKHS  of  the  covariance  of  (Vt).  w.p.  1.  Then.  H^^(N)  < 

This  result  is  a  special  case  of  the  following. 


Prop.  1.  For  any  choice  of  (H^) .  ^^(N)  <  00  in  each  of  (a)  -  (d)  below. 

(a)  pty  is  Gaussian  with  covariance  Ry,  pig  has  covariance  Rg  =  R^TRy 
for  T  trace-class,  and  pt^  =  piy*pig  (convolution). 

(b)  H  =  ^2  or  H  =  l^fO.T],  V  is  a  Gaussian  process  with  sample 

paths  a.s.  in  H  and  covariance  operator  Ry,  S  is  a  possibly  non-Gaussian 
process  independent  of  V  with  sample  paths  as  in  H  and  with  covariance 
operator  Rg.  N  =  S  +  V.  and  Rg  =  for  T  trace-class. 

(c)  V,  S,  and  N  are  as  in  (b).  S'  is  the  Gaussian  process  with  the 
same  covariance  function  as  S.  and  the  paths  of  S'  belong  to  range(Ry) 
with  probability  1. 

(d)  H  =  L,_,[0,T].  S.  V,  and  N  are  defined  as  in  (h),  V  and  S  are 
wide-sense  stationary  and  have  rational  spectral  densities  <P^  and  <J>g,  and 

CO  <J> 

/  j^X)d X  <  ®. 


Cod. Cap. of  CACs-Ll ss-36  -  10/9/89  -  9 


Proof .  (b)  and  (c)  are  equivalent  [2J.  (d)  is  a  special  case  of  (b)  [2,  13] 

(b)  is  obviously  a  special  case  of  (a).  The  proof  will  be  given  for  (a), 
writing  4^  as  4g+y  Let  4y0fx*(A)  =  Hy{y:  x+y  €  A},  where  x  €  H  and 
f^(y)  =  x  +  y.  The  following  statements  follow  from  [2]: 

a)  VC  ~  ae' 

b)  My  ~  4^ 

c)  4^  ~  Pi¬ 
cons  lder  now  the  channel  with  additive  Gaussian  noise  4^,  and  let  its 

information  capacity  C(P)  be  sup  1(4^},  where  the  sup  is  over  all 
probabilities  4^  such  that 

IIR^xll.  The  map  g:  (x.y)  -»  [ (d4y°fx* )/d4y](y )  is  IB[H]xlB[H]/lB[IR]  measurable 
[4],  For  any  4^  satisfying  the  constraint,  we  have  [3]  1(4^,)  = 

'A  Trace  -  Hy(X+V) .  Trace  R^R^R^  £  P,  from  the  constraint.  Since 

1(4^)  't  0,  this  requires  Hy(X+V)  <  «.  Finally,  since  N  =  X  +  V, 


4x[range(R^)]  =  1  and  E^llxll^ 


<  P,  where  llxllM  = 
N 


hgn(n)  *  JI‘°*  35^K  -  I(log 


^  ^  1 


-  VN>  -  J(‘°*  as^H  <■  VN> 


log 


Since  4y  and  4^  are  mutually  absolutely  continuous  and  Gaussian,  [^(V)  <  <» 
Hence,  H^(N)  <  ®.  D 


The  model  Just  described  arises  in  one  of  the  most  frequently-encountered 

nonGaussian  situations-'  when  the  medium  contains  additive  narrowband 

nonGaussian  noise  (S t )  that  is  independent  of  the  additive  wideband  Gaussian 

receiver  noise  (Vt).  The  above  case  applies,  for  example,  if  the  receiver 

2  2 

noise  is  stationary  with  spectral  density  ♦(A)  =  1/(A  +a  ),  and  the  ambient 
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medium  noise  is  stationary  and  nonGaussian  with  spectral  density 


4>j(X)  =  1/B(X) .  where  B(X)  is  a  polynomial  of  order  £  4. 

Consider  now  the  finite-dimensional  channel  defined  as  follows.  The 
additive  noise  has  probability  pjj.  The  input  to  the  channel  is  described  by  a 
probability  p^  on  H^.  satisfying  E^llxll^  ^  Z  nP.  Let  C^(nP)  be  the 

information  capacity  of  this  channel.  The  following  well-known  result  is 
fundamental  to  our  results. 


Lemma  1  [11,  14]. 


*  Nln)logjB^^-1|  *  cJ(nP)  <  X  TVgfP-ffi*1]  +  h£  (N) 
i=l  L  0^+1  W  i=l  L  J3^1 

where  0^  Z  0^J  Z  ...  Z  0^  are  the  eigenvalues  of  S^,  N{n)  =  sup{i  <  n-' 
P P  Z  B(n) } ,  and  B(n)  is  defined  by 

N(n) 

P  =  i  2  (B(n)  -  fi). 

1=1 


h(n) 


Preliminary  Results 

Our  program  is  to  first  obtain  expressions  for  lim  ^C^(nP).  This  will 

n 

then  be  followed  by  the  result  that  the  coding  capacity  is  equal  to 

lim  j-j-  C^J(nP)  when  p^  is  Gaussian  and  that  this  value  is  an  upper  bound  for 
n 

coding  capacity  for  the  nonGaussian  processes  satisfying  lim  —  H™  (N)  =  0. 

n 

Among  the  difficulties  in  evaluation  of  lim  —  cJJ(nP)  is  that  for  each  value  of 

n 

n  one  obtains  an  expression  in  terms  of  the  eigenvalues  of  the  operator  S  . 

The  range  of  S  is  contained  in  H  ;  S  always  has  a  complete  orthonormal  set 
n  n  n 

of  eigenvectors,  and  its  range  space  has  d'mension  Z  n.  Moreover,  the 
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eigenvalues  of  need  not  be  contained  in  the  set  of  eigenvalues  of  Sn+^;  in 
fact,  the  eigenvalues  of  S^+^  may  not  include  a  single  eigenvalue  of  S  . 

The  desired  result  is  an  expression  for  capacity  in  terms  of  the  operator 

Vi  Vt 

S,  where  =  R^(I+S)R^,  and  the  increasing  subspaces  (H^).  This  requires 

that  one  first  determine  relations  between  S  and  S  .  and  in  particular, 

n 

between  the  spectral  properties  of  the  two  operators.  Some  idea  of  the 
complexity  of  this  procedure  may  be  gained  by  observing  that,  in  general,  S 
need  not  have  any  eigenvectors.  Examples  of  such  S  include  the  following: 


H  =  ^  (discrete-time  channel)  with  3  a  Toeplitz  matrix;  H  =  L^fO.T],  with 

[Sx](t)  =  t  x(t)  a.e.  dt,  some  real  number  r  >  0. 

in  T6],  it  is  shown  that  S  =  V  SV*.  where  V  is  a  partially  isometric 
LJ  nnn  n 

operator.  is  isometric  on  range(R^Pn),  zero  on  [range(R^P  ]  ,  and  its 

range  is  equal  to  H^.  Let  =  range(R^Pn).  The  eigenvalues  (P^)  of  and 

n 

their  multiplicity  are  then  the  same  as  those  of  the  operator  P^  SP^  ,  where 

n  n 

P^  is  the  projection  operator  with  range  equal  to  [6], 
n  n 

{e^,  i  2  1}  will  be  used  to  denote  an  o.n.  set  in  H  such  that  = 

span{ej . en).  Similarly,  {u^,  i  £  1}  will  denote  an  o.n.  set  such  that 

=  span{Uj . ur} .  Note  that  one  of  these  sets  is  complete  for  H  if  and 

only  if  the  other  is  complete;  completeness  is  equivalent  to  IIP^x  -  xll  -»  0  for 
all  x  in  H. 


Since  V  is  an  isometry  from  H™  to  H  ,  it  follows  that  V  u.  =  e.  for 
n  w  n  nil 

n 

i  <  n.  This  is  obviously  true  for  n  =  1;  suppose  that  it  holds  for  n  =  K. 

Then,  since  V  c  V  .  and  C  H^+^,  the  statement  must  hold  for  n  =  K+l, 
K  K+ 1 

thus  for  all  values  of  n. 


Let  n  be  fixed  and  define  G  (A)  =  —  [#  eigenvalues  of  S  <  Al.  G  is  a 

n  '  n  L  n  J  u 

left-continuous  non-negative  step  function,  bounded  above  by  1.  The  family 
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{G^,  n  £  1}  will  be  seen  to  completely  define  (for  a  given  set  of  constraints) 

the  coding  capacity.  The  importance  of  aess(S)  to  characterizing  {G^,  n  >  1} 

is  demonstrated  in  Prop.  2  below.  First,  let  V[S,(Hn)]  be  the  set  of  all  y  in 

IR  such  that  for  any  K  and  any  e  >  0,  there  exists  n  \  K  such  that  the  number 

of  elements  in  the  sequence  (f^)  satisfying  -  a |  <  e  is  >  K. 

To  see  that  the  set  V[S,(H  )]  determines  lim  —  [#  eigenvalues  of  S  <  X], 

n  n 

let  X  be  fixed.  Then  lim  G  (X)  =  lim  —  2^  ,  [#  eigenvalues  of  S  in 

n  n  i=l  n 

n  n 

[XitXi+i)].  where  -1  =  X^  <  ...  <  X^+^  =  X.  If  lim  G  (X)  >  0,  then  there 

n 

exists  at  least  one  X^  <  X  such  that  lim  ^  [#  eigenvalues  in  [X^.X^j)]  >  0. 

n 

This  requires  that  contain  at  least  one  point  in  V[S,  (H  )].  Thus, 

for  any  X,  lim  G  (X)  is  determined  entirely  by  those  y  <  X  such  that 
n 

lim  ^  [#  eigenvalues  of  S  in  (y-a.  a+fc)]  >  0  for  every  a  >  0,  and  any  such  y 
n 

must  belong  to  V[S,  (Hn)]. 


Prop.  2  f6]. 


(1) 

(2) 


Suppose  that  (u^,  n  £  1}  is  a  c.o.n.  set  for  H.  Then 

-ess(S)  c  v[s.(Hn)]. 

Let  0^  and  0^  denote  the  smallest  and  largest  points  in  aess(S)- 
Then  V[S.(Hn)]  C  [0^]. 


The  results  of  Prop.  2  might  lead  one  to  hypothesize  that 
V[S,(Hn)]  C  aesg(S).  that  V[S.(Hn)]  =  aesg(S)  when  (un.  n  l  1}  is  complete. 

and  that  lim  —  [#  eigenvalues  of  S  <  X]  is  independent  of  the  choice  of  any 
n 

c.o.n.s.  {u^.  n  £  1}.  These  three  properties  would  be  very  useful. 
Unfortunately,  all  three  are  false,  in  general,  although  the  first  two  will 
hold  for  some  important  choices  of  (H^). 
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(1)  If  {u  ,  n  2  1}  is  not  complete  for  H,  then  V[S,(H  )]  PI  a  (S)  =  <p 

xi  n  6  s  s 

is  possible. 

(2)  If  {un>  n  2  1}  is  complete  for  H,  then: 

(a)  V[S,(Hn)]  C  aess(S)  is  not  always  true, 

(b)  Let  Q.(S,(H  ))  be  the  eigenvalues  ff*  of  S  such  that 

An  in 

|/3?-x|  £  A  >  0  for  all  x  in  a  (S).  Then  lim  —  Q.(S,(H  ))  can 
1  i  1  ess  n  A  n 

be  strictly  positive. 

(c)  If  {u  .  n  2  1}  is  complete  for  H.  then  lim  j-j-  {#  eigenvalues  of 

n 

Sr  >  X}  need  not  be  independent  of  the  choice  of  {u^,  n  >  1}. 


Coding  Capacity 

The  following  theorems  give  a  general  result  for  coding  capacity.  See 
[6]  for  the  proof  of  Theorem  1  and  the  Corollary. 


Theorem  1. 


_  B  rB  +l-i  _  . 

(1)  lim  X  log  -2_-  dFn(X)  *  lim  ±  cg(nP) 

rt— tfn  **  ^  n  10 

_R  4.1  - 

^  1  * f n 


n-» 


n-»» 


n-H» 


r  B 

rB  +  li 

log 

n 

l  X+lJ 

___  D  rB  + 1 

t-1  tit 

n-*” 


dF  (X)  ♦ 

n-*“ 


B 

where  B  is  defined  by  P  =  J  ”  (B  -X)dF  (X) 
n  -I  n  n 

and  Fn(X)  =  “  {#  eigenvalues  of  i  X} 
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(2)  If  lim  —  C?j!(nP)  >  0,  and  lim  —  {#  eigenvalues  of  S  <  X}  exists  for 

n  n  W  n-*»  n  n 

all  X  in  1R,  then 
B 

Vi  log  dF(X)  $  Tim  I  cJ(nP) 

i  *  X?!  log[|^]dF(X)  +  HUT  I  h"n(N) 

n 

where  F  is  a  distribution  function  defined  by 

F(X)  =  lim  if#  eigenvalues  of  S  £  X}  =  lim  F  (X), 
n-*30  n-*°  n 

and  the  constant  B  is  defined  by 
p  =  X?x[B-X]dF(X). 

(3)  If  Ti^  i  H^(N)  =  0,  then  Ti^  i-  cJ(nP)  =  0  if  and  only  if 

n  n 

lim  i-  {#  eigenvalues  of  S  <  X}  =  0  for  all  X  in  IR.  This  requires 
n-«>  n  n 

that  S  be  unbounded  and  occurs  in  particular  if  +«>  js  the  only  limit 
point  of  o(S). 

Remarks .  (1)  In  part  (2),  the  same  result  holds  if  F(X)  =  lim  F  (X)  exists 

n-*° 

for  all  X  <  B,  where  B  is  defined  by  P  =  J^j[B-X]dF(X) . 

(2)  In  the  statement  of  Theorem  1,  the  probability  distributions 

{F  ,  n  l  1}  could  be  replaced  by  {G  ,  n  l  1},  where  G  =  if#  eigenvalues  of  S 
n  n  n  n^  n 

strictly  less  than  X].  This  follows  because  the  integrands  are  continuous 
functions,  and  are  zero  at  the  upper  limit  of  the  integral. 

Corollary.  If  lim  i-  H^(N)  =  0,  then  bounds  on  lim  i-  C^(nP)  are  given  by 
n  n 

*  log(l  +  P/X^)  $  Tim  ~  c{(nP)  $  V<  log(l  +  P/Xmin) 
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where  X  .  is  the  smallest  limit  point  in  the  spectrum  of  S,  and  X  is 
min  max 

the  largest.  Moreover,  these  bounds  can  be  attained  by  proper  choice  of 

(H  ). 

v  n7 

An  alternative  form  of  Theorem  1  can  be  given,  as  follows. 


Theorem  1A.  Suppose  that  B  is  the  largest  number  such  that 
_ B 

P  2  lim  J  [B  -X]dF  (X).  Then 
n  -1 


_  B  ra 

lim  J*_1  log  — 
n 


•B  +1 


X+l 


dFn(X)  *  lim  £cg(nP) 
n 

n  1  J 


where  (F  )  and  (B  1  are  defined  as  in  Theorem  1.  If  no  such  B  exists, 
v  n7  v  n7 

and  Tim  i  H^N)  =  0,  then  Tim  ^  cJ(nP)  =  0. 


n 


n 


The  following  result,  together  with  Theorem  1  (and  1A),  gives  the  coding 
capacity.  Part  (b)  of  this  theorem  can  be  proved  from  first  principles, 
beginning  with  Feinstein's  Lemma.  However,  Theorem  1  enables  a  proof  to  be 
given  based  upon  results  due  to  Ebert  [10]  and  Gal lager  [11].  This  approach 
not  only  shortens  the  exposition,  but  also  provides  error  bounds. 


00 

Theorem  2.  Let  C^(P)  h®  the  coding  capacity. 

(a)  C^(P)  i  ~  C^(nP);  when  p^  is  Gaussian  and  C^(P)  =  0,  then  the 

n 

maximum  decoding  error  probability  is  equal  to  one  for  any  rate  >  0 

( i .e. ,  liminf  e  =  1)  . 
n 

(b)  If  p^  is  Gaussian,  then  C^(P)  =  11m  ^C^(nP). 

n 

(c)  Suppose  that  p^  is  Gaussian.  Then,  for  any  fixed  n,  the  maximum 
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decoding  error  probability  £r  is  bounded  by 


/  2e 

£  £  - 

n  i 


s  6 


n 


exp  -T[R(B[r.p]) 


where  0  £  p  £  1,  s  =  p/(2[l+p]Z(l  +  B[n.p]).  5  can  be  taken  equal 
to  1/s,  and  is  an  integration-normalizing  constraint  [11]. 
B[n,p]  is  defined  by 


,  N(n,p)  (l+p)2(B[n.p]-/3")(B[n.p]+l) 
P  =  -  2  1 


n  i=l  (l+p)(l+B[n,p])  -  p( 1+/3?) 
N(n,p)  =  sup{ i  £  n:  0^  <  B[n,p]> 


and 


r[R(B[n.p])]  =  £S£___  +  *  ^  log  [■ 


N(n.p) 


H-Bfn.p] 


(l+p)(l+B[n,p]-P(l+p^)J 
The  corresponding  number  of  code  words  is  [eR^B^n<P])j  where 

R(B[n,p])  =  /(l'P)log[^l+  1  . 

i=l  1  0?  +  1 

Proof .  The  fact  that  lim  ~  C^J(nP)  is  always  an  upper  bound  on  coding  capacity 

n 

follows  by  a  standard  application  of  Fano’s  inequality;  see,  for  example,  p. 

168  of  [16],  The  resulting  inequality  for  a  code  (k  ,n,e  )  is  e  > 

n  n'  n 


1  - 


cJ(nP)  +  log  2 


log  k 


.  This  gives  lim  —  c£(nP)  as  an  upper  bound  on  capaci 
n  n 


ty- 


Suppose  now  that  is  Gaussian,  and  assume  that  (b)  of  the  theorem  holds.  If 

00  1  _ 

C^(P)  =  0,  then  for  any  positive  number  R,  -  CjJ(nP)  <  R  for  all  sufficiently 

large  n,  so  that  lim  i  c£(nP)  =  0,  giving  £n  -*  1  for  any  positive  rate  R. 
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To  obtain  the  lower  bound  for  CZ^(P)  when  4^  is  Gaussian,  one  can  apply 

the  results  of  Ebert  [10],  [11].  They  involve  a  vector  channel,  consisting  of 

a  set  of  n  parallel  one-dimensional  Gaussian  channels  where  the  noises  in  the 

different  channels  are  mutually  independent  with  variances  i  <  n.  The 

_  2 

code  words  are  vectors  x  which  satisfy  the  constraint  27  x7  <  nP.  With  this 

1  1  _ 

model,  Ebert  shows  the  existence  of  a  code  ([e^^^n,^^],n,e  ) ,  with  R(B[n,p]) 
defined  as  in  (c)  of  the  theorem,  and  e.  obeying  the  upper  bound  given  there. 

1  N(n) 

In  those  equations,  p  =  0  gives  T[R(B(n))]  =  0  and  P  =  —  2  [B(n)  -  P.]. 

n  i=l  1 

T[R(B[n,p])]  >  0  for  p  >  0,  and  B[n,p]  decreases  (for  fixed  P)  as  p  increases. 
Thus,  for  every  p  >  0.  one  has  an  admissible  rate  R^,  defined  by 


1 


_  B[n,p] 


R  =  lim  —  R(B[n,p])  =  (by  Theorem  1)  lim  !4  J  log 
P  n  n  n  -1 


B|~n,  p]+l 


X  +  1 


dF  (X). 


This  gives  a  lower  bound  on  capacity  of 


lim  R  =  lim  lim  %  ^  J^logf^'ffi 
p-O  P  M3  n  -1  L  X  +  1 


£]±! 


dF  (X). 
nv  ’ 


B[n.p] 

Since  f  log 
0 

it  follows  that 


^QL?,P.K*  dF  (X)  is  non-decreasing  as  p  decreases  for  fixed  n, 
A  +  i  j  n 


lim  R  <  lim  &  ^  j  ^log[B^’ffi- 


p-O  n  -1 


dFn(X). 


Moreover,  if  for  A^  >  0,  B[n,0]  +  1  -  An  >  0,  B[n,p]  =  B(n,0)  -  A^,  then 


_  B[n,0]  fBrn  01+ 1 1  _  B[n,p] 

lim  S  log  “*7  i-  dF  (X)  -  lim  f  log 
-1  l  X  +  1  J  n  n 


n 


B|~n,p~]+1 


5  “■  BCn£°3,os[sfer^r3dFn(x)  s  11-  BfnTo]  \  -  1  • 


X  +  1 

A 


dF  (X) 
nv  ’ 


and  since  B[n,0]  is  bounded  away  from  -1,  we  obtain 
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where 


lim 

p-C 


R  =  lim 
p  n 


B[n,0] 

A  S  log 
-1 


B£n 


dF  fX) 


N(n) 

P  =  —  2  [B[n,0]  -  and  N(n)  =  sup{i  £  n:  0?  <  B[n.O]}. 

n  i=l  1 

To  apply  this  result,  one  proceeds  similarly  to  [11].  For  fixed  n,  the 

2  2 

channel  considered  here  has  code  words  in  H  constrained  by  llxllm  =  lly II  < 

n  J  W.n  J 

lA  n 

nP.  where  y  is  the  unique  element  of  satisfying  ^y  =  x.  The  noise 

'A  A 

has  covariance  operator  R^  =  R^  (I  +  S^)R^  .  Thus,  this  is  the  same  as 

2 

the  channel  with  code  words  y^ . y^  satisfying  !!yll  <  nP  and  with  the 

additive  noise  having  covariance  I  +  S^.  Expanding  all  code  words  and  noise 
sample  paths  in  terms  of  the  orthonormal  eigenvectors  of  I  +  S^,  one  obtains 
a  channel  whose  output  is  the  sum  of  a  vector  of  n  independent  parallel 
Gaussian  channels,  with  the  outputs  of  the  n  channels  being 

mutual ly-orthogonal  elements  of  H^.  The  i^  channel  has  additive  noise  with 
covariance  operator  (1  +  0^)v”®v^,  and  the  code  words  (y^)  can  be  written  as 
y^  =  2^_jyj^v”,  where  2^_^  yj^  £  nP.  y^jV™  being  the  input  to  the  ith  parallel 
channel  when  the  code  word  y^  is  selected.  Since  the  individual  channels  have 
outputs  that  are  mutually  orthogonal  in  H^,  the  probability  of  correct 
decoding  for  the  sumned  output  is  the  probability  that  all  of  the  individual 
channel  outputs  are  correctly  decoded.  The  Ebert  results  thus  apply,  and  part 
(b)  of  the  leinna  is  proved.  Part  (c)  follows.  □ 


Applications:  Discrete-Time  Channels 

For  the  discrete-time  memoryless  Gaussian  channel  with  R^  =  I,  the 
theorems  give  easily-obtained  new  results  for  nonstationary  channels.  In  this 
case.  Rjj  =  I+S;  since  S  is  diagonal,  the  eigenvalues  of  I  +  are  (a^), 
i  $  n,  where  I+S  =  diag[a^ ,a^, . . . ] .  The  spectral  limit  points  {9^ . 9^}  of 
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I+S  are  the  limit  points  of  the  eigenvalues  (a^)  of  R^.  These  limit  points 
and  theii  'relative  frequencies"  completely  characterize  the  capacity  for  this 
simple  channel,  whereas  in  the  general  case  the  family  of  distribution 
functions  (F  )  can  converge  to  a  distribution  function  with  points  of  increase 


n' 


at  points  that  are  not  limit  points  of  the  spectrum  of  I  +  S.  For  the 

2  2 

stationary  memory  less  discrete-time  channel  with  =  I  and  =  a  I,  a  is 
the  only  limit  point  of  I  +  S,  and  so  by  Prop.  2  one  obtains  the  well-known 


result  that  C^(P)  =  2  log 
2 


a  ■ 


This  is  also  the  value  of  C^(P)  if  R™  =  I 

and  Ryj  =  a~I  +  H.  where  M  is  any  operator  in  H  such  that  M  is  compact.  This 
follows  from  the  fact  that  compact  operators  in  a  Hilbert  space  are  exactly 
those  operators  that  have  zero  as  the  only  limit  point  of  their  spectrum. 

Thus,  if  the  noise  is  of  the  form  N  =  +  Ng.  Nj  stationary  and  uncorrelated 

2  T  2l 

with  variance  a  ,  and  Ng  independent  of  with  U  I  then  the  coding 

Wl 

capacity  is  again  f  log|l  +  .  Of  course,  we  are  assuming  as  always  that  all 

processes  have  zero  mean. 

These  remarks  follow  from  the  following  result. 

Prop.  4.  Let  H  =  =  {x:  =  0  for  i  £  a} ,  =  I ,  and 

R^  =  diag[<Jj ,  i  \  1],  Suppose  that  is  Gaussian  and  that  (ct.)  has  the 

limit  points  8^  <  8g  <  ...  <  9^.  Then 

c£(P)  =  Ti^  2  7"  log [3"“] 
n  j=l  3  L  V 

where  nr1?  =  M^/n,  M1}  is  the  number  of  elements  in  the  sequence  (a.,  i  <  n) 

j  j  j  1 

belonging  to  (8j-e,  9j+e),  e  >  0  satisfies  2c  <  min{8j+^-8j:  j  >  0, 

8q  =  0}.  (Bn)  is  defined  by  P  =  -  0^),  and  J  Is  the  largest 

integer  i  K  such  that 
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If  iim  tt”  =  "r^  exists  for  i  £  J  with  J  as  defined  below,  then 
n r*° 

S<P)  -  lo*[y 

where  J  and  B  are  def ined  by  p  ■  -  9j)-  with  0j  the  largest 

element  of  {0j . 0^}  satisfying  P  £  2^_j"r^(0j  -  0^). 

Proof .  Direct  application  of  the  theorems. 

The  case  where  (an)  has  an  infinite  set  of  liiull  points  is  presumably  of 
marginal  interest;  the  capacity  in  that  case  is  less  easy  to  visualize,  but  is 
also  obtained  immediately  from  the  theorems. 

Heuristical ly .  one  can  view  this  channel  as  equivalent  to  K  parallel 
independent  discrete-time  memoryless  channels.  The  kth  channel  has  non-zero 
noise  components  only  for  those  indices  j  such  that  |ctj  -  0^|  <  e.  For  fixed 
n.  a  code  word  for  the  composite  channel  is  then  given  by  y  =  (y^ .yg, . . - .y^) , 
where  yk  is  the  component  of  the  code  word  for  the  kth  channel  and  must 
satisfy  ykJ  =  0  if  \a ^  -  0jJ  l  e,  yk  =  0  if  J  >  n,  while  S^I^y^  <  nP. 

The  effect  of  Prop.  4  is  then  to  replace  the  original  channel  by  K 

parallel  independent  channels,  the  kth  channel  being  a  memoryless  stationary 

channel  with  noise  variance  0k>  The  coder  then  uses  the  K  channels  according 

to  the  probability  distributions  ("r^).  the  (0^).  and  P. 

Using  this  viewpoint,  the  results  of  Prop.  4  then  show  that  the  coder 

chooses  his  code  word  as  y  =  (y^.yg . y^).  where  yk  is  the  component  of  the 

code  word  that  is  used  as  an  input  to  the  kth  channel.  For  fixed  n,  he 

1  n  2  n 

chooses  the  code  word  y  according  to  the  constraint  —  Tv  y,  £  tt"(B  -  0,  ), 

n  K ,  k  n  k 
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the 


k  i  J,  with  y,  =  0  if  j  >  n  or  | a  -  0,  |  £  £.  and  2r?_1y?  =  0,  k  >  J.  In 

Kj  J  J’ 

case  where  lim  "rP  =  tt,  exists  for  k  <  J,  this  gives  the  constraint  —  21?  ,  y?  < 
k  k  n  J  =  1  kj  • 

-rk(B  -  Gk).  k  <  J.  I^=1y^  =  0.  k  >  J.  where  B  =  [P  +  /2j  =  r  j  ‘  The 

CO  J  00  GO 

capacity  is  then  C^(P)  =  jCj (P) *  where  ( P)  is  the  capacity  of  the  jth 

channel  subject  to  the  above  constraint. 

If  the  sequence  of  noise  variances  (o^.  i  £  1)  consists  only  of  numbers 

in  the  finite  set  (0^ .  i  i  K) ,  then  Prop.  4  shows  that  even  for  a  noise 

variance  0^  that  is  repeated  infinitely  often,  such  a  component  of  the  noise 

will  have  no  effect  on  capacity  if  the  relative  frequencies  (t^1)  are  such  that 

lim  'Tj  =  0.  However,  if  P  is  small  and  a  limit  point  0^  of  (ct^,  k  >  1)  is  so 
n 

large  that  it  does  not  appear  in  the  expression  for  capacity  (i.e.,  i  >  J  as 

given  in  Prop.  4),  then  this  limit  point  may  still  affect  the  capacity  if 

lim  >  0.  and  will  always  affect  capacity  if  liminf  -t”  >  0.  This  may  be 
n  n 

viewed  as  somewhat  unexpected,  since  such  a  0^  would  represent  one  of  the  ”K 
parallel  channels"  for  which  the  effective  input  is  zero.  However,  this  is  a 
point  where  the  heuristic  "parallel  channel”  analogy  breaks  down;  this  is  due 
to  the  fact  that  the  "i*"*1  channel"  is  present  for  a  fraction  -t1}  of  the 
available  time,  n,  and  the  coder  is  defining  capacity  in  terms  of  transmission 
time  (i.e.,  part  of  the  allowable  dimensionality  is  being  used  by  a  "channel" 
which  conveys  no  information). 

Prop.  4  (and  the  theorems,  for  more  general  channels)  has  obvious 
applications  to  some  multiuser  channels.  For  example,  consider  a  time- 
division  multiaccess  Gaussian  channel  defined  as  follows.  There  are  K 
sources.  For  transmission  up  to  time  t  =  n,  with  n  2  1.  the  j tk  source  uses 
the  channel  a  fraction  v1'  =  n(j)/n  of  the  time.  The  noise  added  to  the  code 

J 

word  of  the  J**1  source  has  variance  0j  (for  the  n(j)tk  transmission  by  the  jth 
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source)  when  the  overall  transmission  time  is  n.  The  sources  have  the  overall 

constraint  -  2*  i  P  for  each  n  >  1.  Prop.  4  then  shows  how  the 

n  j=l  i=l  J  ji 

available  power  should  be  allocated  among  the  K  sources,  and  gives  the 
capacity.  Examples  of  such  channels  include  that  defined  by  an  earth-orbiting 
satellite  with  K  widely- separated  ground-based  transmitters,  or  a  channel 
where  K  sources  feed  into  a  central  relay  station. 

This  example  is  for  a  very  simple  case  of  multiaccess  channels.  More 
general  problems  can  be  analyzed.  However,  the  basic  idea  is  the  same;  one 
identifies  a  source  (or  group  of  sources)  with  a  limit  point  (or  set  of  limit 
points)  in  o( S),  and  the  corresponding  (t^) ,  n  £  1.  is  the  fraction  of  time 
the  source  uses  the  channel  up  to  time  n.  This  is  for  the  memory  less 
discrete-time  channel;  the  theorems  can  be  used  to  analyze  more  general 
channels.  A  particular  aspect  of  this  model  is  that  the  fraction  of  time  that 
each  source  uses  the  channel  can  vary  with  time;  similarly  for  the  noise 
environment  faced  by  each  source. 

One  can  also  use  Prop.  4  (and  the  theorems)  to  analyze  jamming  channels. 
For  example,  if  a  jammer  must  vary  his  energy  over  different  time  periods, 
Prop.  4  will  permit  the  calculation  of  capacity  for  a  given  set  of  (2.9. P) 

More  generally,  as  will  be  discussed  elsewhere,  the  theorems  permit  one  to 
determine  the  jammer's  minimax  strategy,  subject  to  various  types  of 
constraints  on  the  jamming  signal. 

It  can  be  seen  that  the  best  choice  of  (Hn)  from  the  viewpoint  of 

maximizing  capacity  will  be  the  natural  choice  =  (x:  x^  =  0,  i  >  n)  only  in 

special  situations.  If  0^  <  <  ...  <  9^  are  the  limit  points  of  the  noise 

variances,  and  is  Gaussian,  then  an  optimum  choice  of  (H^)  is  given  by 

H  =  A  HB,  where 
n  n 


kn  =  ^x:  xi  =  °-  1  >  k(nH 
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and 


B  =  {x:  xt  =  0,  |ai  -  0j |  2  e} . 

k(n)  =  smallest  integer  such  that  \a ^  -  0^  |  <  e  for  exactly  n  values 
of  i  i  k(n) . 


e  <  e2 


-  0 


1 


This  definition  of  (H^)  gives 

CJ(P)  =  H  log[l  +  p/01 ] . 


which  is  the  i»aximum  possible  value  for  a  given  S.  Of  course,  this  squares 

perfectly  with  intuition:  the  original  channel  is  transformed  into  a  "channel' 

having  limiting  noise  variance  0^.  at  the  expense  of  increasing  the 

transmission  time  required  to  achieve  a  specified  decoding  error 

The  choice  of  =  {x:  x^  =  0,  i  >  n}  gives  the  optimum  (H^)  when  is 

Gaussian  if  and  only  if  limsup  =  1 .  Thus,  if  choice  of  (H^)  is  part  of  the 

n 

system  design,  then  it  is  only  in  this  case  that  the  capacity  is  equal  to  tint 

which  is  obtained  in  the  classical  Gaussian  channel.  Conversely,  the 

classical  channel  gives  the  worst  possible  choice  of  (H^)  if  and  only  if 

1 im  =  0  for  all  i  i  K-l.  For  example,  consider  a  channel  with  noise 
n 

variances  (a^)  given  as  follows: 


“i  =2 


i  =  J  ,  j  any  integer 
=  2000  otherwise. 

Then  0.  =2,  0_  =  2000;  if  (H  )  is  defined  by  H  =  {x:  x.  =0,  i  >  n} ,  thn 
1  2  n  n  1  i  ' 

lim  =  0,  so  that  C^(P)  =  0.  However,  consider  (H^)  defined  by 
n 


H  =  {x:  x.  =  0,  i  >  n2}  H  B,  where 
n  1  i  ' 


B  =  {x :  Xj  =  0. 


1  *  J 


for  any  integer  j} 
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If  is  Gaussian,  then  this  is  an  optimum  choice  of  (H^)  and  gives 

CO 

C^(P)  =  A  log[l  +  P/2].  One  should  notice  the  differences  in  definition  of 
capacity.  However,  examples  that  are  even  more  striking  can  be  constructed 
for  channels  with  memory;  it  is  possible  to  use  a  code  word  set  of  arbitrarily 
large  cardinality  with  maximum  decoding  error  going  to  zero  as  transmission 
time  n  -»  °°,  even  though  the  classical  channel  has  zero  capacity.  This  can  be 
seen  f  rom  Theorem  1 . 

In  the  above  analyses,  the  cost  of  transmission  time  is  not  quantified. 

In  the  classical  channel,  one  is  implicitly  assuming  that  transmission  time 
must  be  minimized  for  a  code  word  set  of  dimension  o.  The  formulation  given 
here  permits  one  to  remove  this  constraint.  When  p^  is  Gaussian,  Theorem  2 
can  be  used  to  determine  tradeoffs  between  the  transmission  time,  maximum 
decoding  error,  and  cardinality  of  the  code  word  set,  for  each  n,  for  various 
choices  of  (H^) . 

As  a  final  remark  on  the  memoryless  channel,  one  may  note  that  for  the 

2 

stationary  nonGaussian  channel  with  noise  variance  a  ,  Shannon  obtained  the 
result  [18]  that  for  =  I . 

lim  i-  cJ(nP)  ^  C(P.G)  +  A  log  e2H(G)-2H(N) 
n-*° 

2 

where  G  is  the  zero-mean  Gaussian  random  variable  with  variance  a  ,  C(P,G)  is 
the  capacity  when  G  is  the  noise,  and  H  denotes  differential  entropy.  As 
previously  seen, 

*  log  e2H(G)-2H(N)  =  VN)  =±H>): 

here  H^(N)  denotes  the  relative  entropy  of  the  random  variable  N  to  the  r.v. 
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G;  the  last  equality  follows  by  the  channel  being  memoryless. 

The  theorems  also  permit  sin  immediate  calculation  of  the  known  result  foi 
capacity  of  the  stationary  discrete-time  channel  with  memory  when  a  simple 
energy  constraint  is  used. 


Prop.  5.  Let  H  =  =  I,  and  =  {x  in  £^-  xi  =  0,  i  >  n},  with 

given  by  a  spectral  density  <fc.  Then,  if  lim  —  H^L.(N)  =  0, 

n  Lifi 
n 

B, 


lim  i  cJ(nP)  =  Vi  X 


n 


where  Bq  satisfies 


{x:  *(x)<;B0} 


log 


0 


4>(x) 


dx 


p  =  2^ 


{x:  Bq} 


[Bq  -  $(x)]dx. 


For  this  application,  the  distribution  function  F  is  defined  by  2rrF(x)  = 
m{y:  $(y)  1  x}  ,  where  m  is  Lebesigue  measure  on  [-t.t]. 


Proof.  In  this  case.  P  =  V  .  so 
-  n  n 


1, 


Gn(A)  =  ^-{number  of  eigenvalues  of  Bn(I+S)Pn  <  X  +  1}. 


Now.  by  the  Toeplitz  distribution  theorem  [12] 

1  *  1 

lim  Gn(X)  =  ^  *  Xr0  ~  § w  <  X*  ’ 

when  m  is  Lebesgue  measure,  and  the  result  follows  from  Theorem  1. 


Bounds  on  capacity  of  this  channel  can  also  be  given.  Suppose  that 

$ 


m  i  <t>(x)  i  M,  Jx  1  i  w.  If  lim  t  H^(N)  =  0,  then  'A  log  1  +  ^ 

n 


lim  i  cJ(nP)  $  *  log 


1  ♦  - 
m 


For  Gaussian,  the  fact  that  lim  ^  C^(nP)  = 

n 


2~  J"  Ifo.BjW))  log[^y]dX  with  p  =  27/  I[0  b](^(X))[B  -  <J>(X)]dX  has  been 
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known/assumed  for  many  years.  It  is  credited  to  Pinsker  under  the  assumption 
of  a  stationary  Gaussian  signal  [15].  However,  it  is  apparently  only  recently 
that  a  complete  proof  has  been  given  allowing  a  general  nonstationary  signal 
process  [15].  In  [15],  it  is  assumed  that  $  is  continuous;  that  assumption  is 
not  needed  here. 

Applications:  Continuous -Time  Channels  of  Fixed  Duration 

In  this  section  the  code  words  and  noise  paths  are  elements  of  L^fO.T], 

where  T  <  «  is  fixed.  The  available  energy  per  transmitted  code  word  is  P^: 

2 

for  each  fixed  n  and  a  given  (Hr)  and  R^,  one  has  llxll^  £  Pq  for  each  code 
word  x.  The  question  is  whether  or  not  arbitrarily  small  maximum  decoding 
e~ror  probability  can  be  achieved  by  making  Pq  arbitrarily  large  without 
limiting  the  cardinality  of  the  code  word  set.  It  can  be  assumed  that 
limiting  the  cardinality  of  the  code  word  set  is  equivalent  to  limiting  its 
dimensionality. 

This  problem  is  fundamentally  different  from  that  of  the  classical 
continuous-time  channel,  wherein  the  code  words  are  limited  to  an  energy  of  TP 
and  T  is  permitted  to  become  arbitrarily  large.  In  the  present  case,  for  a 
fixed  value  of  Pq,  one  can  set  Pq  =  nP.  Theorem  2  can  then  be  used  to 
determine  an  upper  bound  on  the  maximum  decoding  error  probability.  Of 
course,  this  requires  that  the  eigenvalues  (0^)  be  determined  for  sufficiently 
many  values  of  n,  so  that  the  expressions  given  in  Theorem  2  can  be  evaluated. 
For  a  given  value  of  Pq  =  nP,  one  then  determines  B[n,p]  for  suitable  values 
of  p  (p  in  (0,1))  and  chooses  the  values  of  n  and  p  that  give  the  most 
satisfactory  compromise  between  the  size  of  the  code  word  set  and  the  maximum 
decoding  error  probability. 

In  the  balance  of  this  section,  attention  will  be  focused  on  determining 
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capacity.  As  previously  discussed,  a  "rate"  is  the  rate  of  increase  in 

log[cardinality  of  the  code  word  set]  as  a  function  of  increasing 

dimensionality.  R  is  then  an  admissible  rate  if  the  maximum  decoding  error 

can  be  made  arbitrarily  small  by  indefinitely  increasing  log[cardinal i ty  of 

the  code  word  set]  at  the  rate  R.  From  Theorem  2,  if  is  Gaussian  and  the 

capacity  is  zero,  then  the  maximum  probability  of  decoding  error  converges  to 

one  as  the  cardinality  of  the  code  word  set  becomes  arbitrarily  large,  for  any 

positive  rate.  We  begin  with  an  example  illustrating  this  situation. 

Let  H  =  L2[0,T]  and  suppose  that  (N^)  has  covariance  operator  having  an 

inverse  which  is  a  densely-defined  differential  operator  of  order  2p.  For 

example,  if  p  =  1,  N  could  have  covariance  function  e  v  '  (a  >  0)  or 

min(t.s).  Let  R^  be  an  integral  operator  whose  inverse  is  a  densely-defined 

differential  operator  of  order  'l  4p.  Thus,  if  p  =  1,  R^  could  be  defined  by  a 

covariance  function  corresponding  to  a  spectral  density  which  behaves,  for 

|A|  -»  ®,  as  #^(A)  =  1/A  ,  where  k  £  1.  Then,  if  (N t )  is  Gaussian,  or  more 

generally  when  lim  ^  H^(N)  =  0,  the  capacity  is  zero,  regardless  of  the 
n 

definition  of  the  subspaces  (H^).  This  result  follows  from  the  fact  that 
'A  ^ 

Rpj  =  R^(I+S)R^,  where  I+S  is  the  inverse  of  a  compact  operator,  thus  has  a 
single  limit  point  in  its  spectrum,  equal  to  +«. 

Some  stationary  channels  with  rational  spectral  densities  defining  both 
Rpj  and  constitute  a  special  case  of  the  above  example.  A  complete  set  of 
results  can  be  given  for  all  stationary  channels  where  R^  and  R^  are  defined 
by  rational  spectral  densities. 

Prop.  6.  Suppose  that  H  =  LgCO.T].  Let  (N ^ )  be  stationary  and  Gaussian 
with  rational  spectral  density  and  suppose  that  R^  is  defined  by  a 
rational  spectral  density  <^.  Then,  for  any  choice  of  (H^): 
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a)  the  coding  capacity  will  be  non-zero  if  and  only  if 


lim 


Vx) 

V*> 


a  >  0; 


b)  the  capacity  is  given  by  C^(P)  =  A  log[l  +  aP]. 


This  also  gives  an  upper  bound  on  the  coding  capacity  if  N  is  nonGaussian 
and  the  following  conditions  are  satisfied:  (N^)  =  (Gf  +  Vt)  where  (Gt) 
is  stationary  and  Gaussian  with  rational  spectral  density  3>  (V  )  is 
independent  of  (G^).  possibly  nonGaussian,  stationary  or  nonstationary. 


and  such  that  E  J" 


jvfuii 


dX  <  “  for  the  sample  paths  v  of  (V^),  where 


v  is  the  Fourier  transform  of  v. 


Then.  cJ(P)  =  A  log 


1  +  P  lim 
*•  1^1-** 


Vx)j 

VM.' 


Proof .  When  and  R^  are  defined  by  rational  spectral  densities  <t> ^  and  <J>N, 
then  it  can  be  shown  from  well-known  results  [2],  [13]  that  R^  =  R^(I+Y)R^  , 
where  the  operator  V  has  the  following  properties: 

a)  V  is  bounded  if  and  only  if  {♦^(X)/*^(X) ,  |X|  >0}  is  bounded; 

b)  if  V*N  is  integrable  over  (-*,»),  then  I+V  is  trace-class; 

c)  if  V  is  bounded  and 

VX) 

lint  ■  f \  \  * 

Vx) 

then  I+V  has  a  single  limit  point  for  its  spectrum,  equal  to  a. 

Using  these  facts,  one  notes  that  since  R^  =  R^(I+V)  1R^  =  R^(I+S)R^  ,  S  must 
be  unbounded  with  the  single  limit  point  +«  if  is  integrable  (since  I+V 

has  only  zero  as  a  limit  point).  By  Theorem  1  and  Prop.  2,  C^(P)  is  then 
zero.  If  I+V  has  a  as  its  only  limit  point,  then  I  +  S  =  (I+V)  *  must  have 
a  1  as  its  only  limit  point  (note  that  I+V)  +  (l-a)I  =  V  -  al  is  compact);  (b) 
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of  Prop.  6  follows.  The  remainder  of  Prop.  6  can  be  obtained  from  the  results 
of  [13].  □ 


As  can  be  seen,  there  is  no  "waterfilling"  aspect  to  the  statement  of 
Prop.  6.  This  is  because  the  operator  S  has  only  a  single  limit  point  in  its 
spectrum  if  H  =  l^O.T]  and  R^  and  are  defined  by  rational  spectral 
densities.  The  waterfilling  interpretation  can  be  applied  to  Theorem  1;  it  is 
in  terms  of  the  family  of  pure  jump  distribution  functions  (Fn) ,  which  is 
determined  from  V[S,(Hn)].  It  is  notable  that  the  results  of  Prop.  6  are 
independent  of  the  value  of  T. 


From  (a'  of  Prop.  6.  C^(P)  =  0  if  Is  integrable.  This  is  the  class 

of  channels  considered  in  the  Hoi singer-Cal lager  result  for  the  classical 

continuous- time  channel  [11.  Sec.  8.5],  T  -»  It  may  be  judged  only  natural 

that  C^(P)  =  0  if  T  is  fixed,  since  lim  ^  C^(TP)  is  finite,  where  C^(TP)  is 

T-#» 

the  capacity  for  the  channel  restricted  to  the  interval  [O.T]  and  with  the 
2 

constraint  EIIXII^  £  PT.  Since  the  dimensionality  is  not  constrained  in 
computing  cJ(TP),  C^(TP)  £  C^(nP)  when  T  =  n.  Moreover,  when  T  is  fixed,  the 
capacity  is  determined  in  terms  of  C^(nP)) .  However,  if  and  are 

vx> 

given  by  rational  spectral  densities  such  that  lim  •.  .  is  finite  and 

|a|h»Vx) 


non-zero,  then  the  capacity  is  finite  and  non-zero  for  both  the  classical 
channel  [7]  and  (by  Prop.  6)  the  fixed-time  channel. 

Another  result  along  these  lines  is  the  following. 


Prop.  7.  Suppose  that  lim  —  H^(N)  =  0,  that  {u  ,  n  £  1}  is  complete, 

n 

and  that  is  the  zero-mean  Gaussian  probability  with  R^  as  covariance. 

Then,  lim  ~  C^(nP)  =  A  log[l+P]  if  p^  is  absolutely  continuous  w.r.t. 
n 
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(with  respect  to)  4^.  More  generally,  lim  ^  C^(nP)  =  A  log[l  +  71^]  if 


there  exists  a  constant  a  <  1  such  that  is  strictly 


G.a 


positive  definite  and  p^  and  p^’  are  mutually  absolutely  continuous, 


G.a 


where  p^’  is  the  zero-mean  Gaussian  probability  with  covariance  operator 


■S- 


Proof .  If  p^  ~  p^'a,  then  +  aR^  =  R^(I+T)R^,  T  Hi lber t-Schmidt ,  so  that 
A  A 

R^  =  R^(I  +  [T-aI])R^;  since  T  -  al  =  S  has  -a  as  the  only  limit  point  of 
ct(S),  the  result  follows.  Note  that  a  <  1  is  necessary  because  I  +  T  -  al 
must  be  non-negative,  requiring  T  £  (a-l)I  Since  T  is  compact,  this  can  hold 
only  for  a  £  1,  and  the  case  a  =  1  violates  the  basic  (and  necessary,  for 

L t  „ JUL  _  1 

finite  capacity)  assumption  that  R^  =  R^(I+S)R^  with  (I+S)  bounded.  For, 
if  a  =  1,  then  R^  +  aR^  =  +  R^TR^  and  T  Hi lber t-Schmidt  implies  that 

R^  =  R^TR^,  and  T  1  cannot  be  bounded.  □ 


The  coding  capacity  of  the  matched  channel  (R^  =  R^)  is  A  log(l+P). 

Thus,  if  p^  ~ 

not  sufficient  to  affect  the  coding  capacity.  However,  if  p^  1  p^,  then  one 
still  obtains  finite  capacity  under  the  assumptions  of  Prop.  7,  but  its  value 
can  be  greater  or  smaller  than  that  of  the  matched  channel. 

Prop.  8.  Suppose  that  lim  jjj-  H^(N)  =  0.  In  order  that  C^(P)  be  more 

n 

than  zero,  it  is  necessary  that  with  T  bounded  but  not 

compac  t . 

Prop.  8  follows  immediately  from  the  preceding.  (I+S)  *  =  T,  so  that  T 
compact  implies  that  (I+S)  has  +“  as  the  only  limit  point  of  its  spectrum. 
This  Is  actually  the  situation  that  holds  in  the  Holsinger-Gal lager  model. 


p^j,  then  the  difference  between  the  two  operators  R^  and  R^  is 
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The  key  to  interpreting  these  results  lies  in  the  expressions  for 
information  capacity  when  is  Gaussian  [5].  When  S  has  a  single  limit  point 
equal  to  +m  (as  in  the  case  of  Example  2  and  in  Prop.  6  when  <$^/<I>^  Is 
integrable),  then  S  has  a  CONS  of  eigenvectors  and  corresponding  eigenvalues 
(A^),  X^  S'  a.  With  no  dimensionality  constraint  on  the  transmitted  signal 
process.  Theorem  3  and  Corollary  4  of  [5]  show  that  the  optimum  signal  process 
(for  achieving  information  capacity)  has  finite-dimensional  support  when 


Fq  ^  nP  is  fixed. 

It  then  follows  that  increasing  the  dimensionality  of  the  signal  space 

past  the  optimum  value  actually  decreases  information  capacity.  This  is 

consistent  with  lim  ^  C^J(nP)  =  0  for  any  fixed  value  of  P. 
n-i» 

However,  when  the  smallest  limit  point  0  of  S  is  finite,  then  for  a 

K 

sufficiently  large  value  of  Pq  =  nP.  one  will  have  PQ  +  2  ^  K0,  where  now 

(Xi)  denotes  those  (increasing)  eigenvalues  of  S  strictly  less  than  0. 

Theorem  2c  of  [5]  then  applies,  setting  the  dimensionality  as  n  =  M.  Let 
K  =  min(L.M),  where  L  is  the  number  of  eigenvalues  of  S  strictly  less  than  0. 
Then,  permitting  Hy  to  be  any  M-dimenslonal  subspace,  one  has  [5] 


K 


c$(p0)  =  *  2  log m 


i=l 


+  J  i°Sll  + 


* 


K 

P  +  2  (X  -0) 

i=l  1 


M( 1+0) 


with  the  constraint  given  by  E  llxil^  i  P^.  Since  P^  =  HP,  taking  M  -*  «  gives 
lim  £  cJ(MP)  =  H  log 


Of  course,  this  is  an  upper  bound  for  the  coding  capacity,  in  general, 
since  (H^)  can  be  any  sequence  such  that 
sequences  that  are  not  ordered  by  Inclusion.  However,  if  S  has  only  one  limit 
point  0  in  its  spectrum  (as  in  Prop.  6  when  ♦^(X)/<fr^(A)  -»  a  /  0)  then 


Hjj  is  M-dimensional ,  including 
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A  log^l  +  yTqJ  is  in  fact  the  coding  capacity,  as  has  been  seen  from  the 
corollary  to  Theorem  1. 

The  preceding  results  thus  enable  one  to  determine  whether  or  not 
arbitrarily  small  error  probability  can  be  achieved  while  indefinitely 
increasing  the  log  of  the  cardinality  of  the  code  word  set  at  some  positive 
rate,  and  give  the  capacity.  As  discussed,  the  capacity  in  this  framework  is 
the  supremum  of  all  admissible  rates,  and  the  "rate"  is  defined  as  [log  kn]/n. 
where  n  is  the  dimensionality  of  the  code  word  set  and  k^  is  the  number  of 
code  words. 
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